{smcl}
{* *! version 1.2  09dec2017 Michael D Barker Felix Pöge}{...}
{cmd:help ustrdist}
{hline}

{title:Title}

{phang}
{cmd:ustrdist} {hline 2} Calculate the Levenshtein distance, or edit distance, between strings.


{title:Syntax}

{p 8 17 2}
{cmd:ustrdist}
{c -(}{varname:1}|{cmd:"}{it:string1}{cmd:"}{c )-}
{c -(}{varname:2}|{cmd:"}{it:string2}{cmd:"}{c )-}
{ifin}
[{cmd:,} {opth g:enerate(newvar)} {opth max:dist(integer)}]


{title:Description}

{pstd}
{cmd:ustrdist} calculates the distance between strings and/or string
variables using the Levenshtein distance metric. Levenshtein distance, or edit
distance, is the smallest number of edits required to make one string 
match a second string. An edit may be an insertion, deletion, or 
substitution of any single letter.

{pstd}
{cmd:ustrdist} accepts two arguments, which may be string variables or 
string scalars in any combination. String scalars must be enclosed in quotes. 

{pstd}
Edit distances are returned in a scalar or a new variable, depending on
the type of arguments supplied. If the arguments contain one or two string 
variables, edit distances are returned in a new variable with default 
name {bf:ustrdist}. If both arguments are string scalars, edit distance 
is returned in {bf:r(d)}.

{pstd}
Unicode characters are supported. 

{title:Options}

{dlgtab:Main}

{phang}
{opth generate(newvar)} Create a new variable named {it:newvar} containing 
edit distance(s). If the arguments include a string variable without
the {opt generate()} option, a new variable will be created with
default name {bf:ustrdist}.

{phang}
{opth maxdist(integer)} Calculate string distances only up to an upper bound.
If the string distance exceeds that bound, the result is set
to missing (.). Only values of 1 or higher are valid upper bounds.

{title:Examples}

{phang}{cmd:. ustrdist "cat" "hat"}

{phang}{cmd:. sysuse census} 

{phang}{cmd:. ustrdist state "west virginia", gen(wvdist)} 


{title:Saved results}

{pstd}
{cmd:ustrdist} saves the following in {cmd:r()}:


{synoptset 15 tabbed}{...}
{p2col 5 15 19 2: Scalars}{p_end}
{synopt:{cmd:r(d)}}edit distance if arguments are both string scalars{p_end}

{synoptset 15 tabbed}{...}
{p2col 5 15 19 2: Macros}{p_end}
{synopt:{cmd:r(ustrdist)}}name of new edit distance variable if created{p_end}
{p2colreset}{...}


{title:Author}

{pstd} Michael Barker {p_end}
{pstd} Georgetown University {p_end}
{pstd} mdb96@georgetown.edu {p_end}

{pstd} Felix P{c o:}ge {p_end}
{pstd} Max Planck Institute for Innovation and Competition {p_end}
{pstd} felix.poege@ip.mpg.de {p_end}

We thank Sergio Correia for his suggestions on how to improve strdist's performance.

{title:Also see}

{pstd}
{help f_soundex:soundex},
{help strgroup:strgroup}


